Improved Topic Classification Model Using K-norm Base
نویسندگان
چکیده
Maximum Entropy (MaxEnt) model has been proven to be a very effective approach in the topic classification task, where a specific topic from a pre-defined topic set will be assigned to each sentence. Although it is originally developed based on the motivation of maximizing the conditional probability entropy under certain constraints, MaxEnt model is indeed an exponential distribution model that maximizes the log-likelihood of the training data. This log-likelihood criterion bears similarity with the classification accuracy criterion, which is the ultimate performance measure of a topic classifier. But these two criterion still differ from each other, and their discrepancy consequently reduces the benefit of optimization in improving classification accuracy. In this paper we propose to use different objective functions, which are closer to the classification accuracy criterion, to replace the log-likelihood objective used in the MaxEnt model estimation process. Specifically, we propose a Summation-Log K-norm objective and a Summation K-norm objective. Our experiments conducted on two large volume topic classification dataset prove the effectiveness of our new objectives in improving topic classification performance on top of the state-of-art MaxEnt model.
منابع مشابه
An Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification
The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...
متن کاملAn Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification
The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...
متن کاملAn Improved Flower Pollination Algorithm with AdaBoost Algorithm for Feature Selection in Text Documents Classification
In recent years, production of text documents has seen an exponential growth, which is the reason why their proper classification seems necessary for better access. One of the main problems of classifying text documents is working in high-dimensional feature space. Feature Selection (FS) is one of the ways to reduce the number of text attributes. So, working with a great bulk of the feature spa...
متن کاملAn Improved Flower Pollination Algorithm with AdaBoost Algorithm for Feature Selection in Text Documents Classification
In recent years, production of text documents has seen an exponential growth, which is the reason why their proper classification seems necessary for better access. One of the main problems of classifying text documents is working in high-dimensional feature space. Feature Selection (FS) is one of the ways to reduce the number of text attributes. So, working with a great bulk of the feature spa...
متن کاملCritical Drag Investigation for an Axisymmetric Projectile with Choked Base Bleed at High-Subsonic and Transonic Regime Using SST K-? Model
In the following paper, the effects of a choked jet exhausted from the base of a non-lifting body on its total and base drags at sub-sonic and transonic regimes has been numerically investigated. Having surveyed the results of some turbulence models and after comparing with experimental results, an appropriate turbulence model i.e. SST K-?, has been chosen and this model has been used in the su...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006